-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[link-training] initial support #925
Conversation
0c4b84a
to
a029855
Compare
|
||
#### PMON xcvrd Considerations | ||
|
||
Given that port link training shall not be enabled on certain transceivers. For example, a chip-to-module transceiver. It's better to have pmon#xcvrd provide a hint to the swss#orchagent for the advanced link training controls. Hence, **xcvr_capabilities** is introduced into APPL_DB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will be the criteria for determining if a module supports link training?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LT is not going to work if there is a retimer in-between. For examples:
- DR transceiver
- LR transceiver
- chip-to-module transceiver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, perhaps I wasn't clear. Is there a list of specific module types you're planning to have that explicitly does (or doesn't) support LT somewhere? If so, how is it defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CR modules only, the details will later be updated to the HLD
Signed-off-by: Dante Su <dante.su@broadcom.com>
a029855
to
d86199f
Compare
|
||
Vendor-specific SAI implementation is not in the scope of this document, but there are some common requirements for SAI: | ||
|
||
1. SAI implementation must return error code if any of the above attributes is not supported, swss and syncd must not crash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ensure that SAI supports sai_query_attribute_capability and use it before calling link training set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Maybe we could query the link training SAI capability and put it into STATE DB, so that CLI can use it and tell user if the platform does not support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments, in the current implementation (i.e. PR for sonic-utilities) it resues the xcvr_capabilities in APPL_DB for this, and we could also duplicate this info to STATE_DB, and the xcvr_capabilities will later be reused in AutoNeg for the AN and speed capability, and the speed capability may changed per port breakout operation, hence it's better to create a new 'TABLE TRANSCEIVER_CAPS' for these dynamic values, if we want to make a copy in STATE_DB. Please feel free to share your thoughts, thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ds952811 I believe the SAI capabilities should determine whether xcvrd should publish the capabilities. Please ensure that xcvrd publishes the capabilities only when orchagent has updated STATE DB from SAI that link training is supported by ASIC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments, and based on the conclusion of the earlier meeting, the changes against xcvrd will be dropped, and the config command should directly reference the TRANSCEIVER_INFO of STATE_DB for the ability checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Maybe we could query the link training SAI capability and put it into STATE DB, so that CLI can use it and tell user if the platform does not support it.
Yep, maybe we could create a new table for the capabilities, will add this later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Maybe we could query the link training SAI capability and put it into STATE DB, so that CLI can use it and tell user if the platform does not support it.
Yep, maybe we could create a new table for the capabilities, will add this later
Unfortunately, there is a conflict on this from the comment in AutoNeg PR, which is for the timing constraint in the warm-reboot scenario. Hence the capability will only be initialized when there is a AN or LT config update, that being said the capabilities list will not be initiated during the system startup.
i.e. We're not going to add this into STATE_DB for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
|
||
| Mode | Description | | ||
|:----:|:---------------------------------------------------------------:| | ||
| auto | Enable link-training if applicable to the transceiver attached | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the auto mode as discussed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do, thanks for the reminder
key = PORT_TABLE:port_name ; configuration of the port | ||
; field = value | ||
... | ||
link_training = STRING ; operational link training config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename the variable to reflect if it is admin or oper status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link_training is the admin, and all the operational status will be integrated as enum like string into link_training_status
fvp = swsscommon.FieldValuePairs([ | ||
('type', 'link_training') | ||
]) | ||
port_np.send('port_state_refresh', port, fvp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the sequence diagram. While portsorch is the consumer of this notification who is going to be the producer for this PORT_CTRL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the further integration with CLIs (REST, KLISH...etc.) and avoid possible confusions, it's better to periodically poll the status when LT is enabled, hence this will be updated to a periodical polling
|
||
When **link_training=auto**, swss#orchagent should request syncd to enable port link training only if **LT** is specified in **xcvr_capabilities**. | ||
|
||
### Warmboot Design Impact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add considerations for fast reboot. Currently pmon starts after syncd and with link training in place how will this be impacted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm right about it, the fast-reboot is a case that kernel will be restarted from loading a new ELF file, without BIOS intervention, and this will be a COLD-REBOOT scenario, from the switch silicon, swss and pmon perspective, hence none extra care will be necessary.
In the case of warm-reboot, the APPL_DB and STATE_DB will be restored, and the switch silicon hardware state will not be changed and seamless resumed, all we need is to make sure static pre-emphasis request will not be applied in this scenario. Of course, a system test could provide more details, and I'll update the HLD if any issues are observed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I meant in case of fast-boot, the requirement is to have downtime less than 30 seconds. In normal scenario this wouldn't be a problem as syncd can come up and initialize the ports whereas in this case the xcvrd should publish the capabilities which will be required for link to be up. So the link downtime may be greater than 30 seconds. Please verify this scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments, and based on the conclusion of the earlier meeting, the changes against xcvrd will be dropped, and the config command should directly reference the TRANSCEIVER_INFO of STATE_DB for the ability checks.
String value, the operational port link training failure status. "none" indicates no error is detected, otherwise any of the link training failure status defined in SAI. For example: "none", "lock", "snr" and "timeout". | ||
- link_training_rxstatus: | ||
String value, the operational port link training rx status. any of the link training rx status defined in SAI. For example: "trained", "not-trained". | ||
- xcvr_capabilities: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are all the list of values that can be in xcr_capabilities apart from LT. Please detail it in the HLD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now, only 'LT' will be present, and yes, I'll update the HLD for this, thanks for the comment
|
||
#### PMON xcvrd Considerations | ||
|
||
Given that port link training shall not be enabled on certain transceivers. For example, a chip-to-module transceiver. It's better to have pmon#xcvrd provide a hint to the swss#orchagent for the advanced link training controls. Hence, **xcvr_capabilities** is introduced into APPL_DB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can capture the types of transceivers which you plan to incorporate in code that can support LT vs that cannot it would be great.
What is going to be the action when user enables LT on a transceiver that doesn't support? If we emit error log, capture it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The xcvrd will inspect the 'specification_compliance' and set 'LT' to xcvr_capability if it's a CR module (SR may also works, but some of the transceivers may inappropriately claims itself a SR module, while it's a CuSFP with RJ45.
- In the swss#orchagent, a error/warning will be logged for this, but the operation will still proceed
- In the config command (sonic-utilitie), the users will be blocked by an error indicate this is an unsupported module, and user could always use '-f' to force to run the operation
|
||
Arguments: | ||
interface_name: name of the interface to be configured. e.g: Ethernet0 | ||
mode: link training mode, can be either "auto", "on" or "off" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the mode is auto, what is the value of SAI_PORT_ATTR_LINK_TRAINING_ENABLE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the discussion in the last meeting, the auto mode will be dropped
|
||
Vendor-specific SAI implementation is not in the scope of this document, but there are some common requirements for SAI: | ||
|
||
1. SAI implementation must return error code if any of the above attributes is not supported, swss and syncd must not crash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Maybe we could query the link training SAI capability and put it into STATE DB, so that CLI can use it and tell user if the platform does not support it.
To minimize system overhead, instead of periodically fetching the operational link training status, the corresponding fields in the APPL_DB will only be updated in the following events | ||
|
||
- Per-Port link status changes | ||
In this case, only the **link_training_status** will be updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it get link training status? By which SAI attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operation state could be fetch by the following SAI attributes
- SAI_PORT_ATTR_LINK_TRAINING_FAILURE_STATUS
- SAI_PORT_ATTR_LINK_TRAINING_RX_STATUS
And 'link_training_status' will be updated to display only one of the best appropriate description for the current state.
e.g. 'not_trained' will only be presented when none of failure is reported by SAI_PORT_ATTR_LINK_TRAINING_FAILURE_STATUS, and the LT hardware engine failed to get a fine-tuned pre-emphasis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HLD will later be updated to have this information, it should be done later today
Signed-off-by: Dante Su <dante.su@broadcom.com>
Signed-off-by: Dante Su <dante.su@broadcom.com>
@ds952811 What will be the behavior if AN and LT are turned on together? Are there any CLI level blocking planned for it? |
Conclusion: |
Signed-off-by: Dante Su <dante.su@broadcom.com>
- empty upon success | ||
``` | ||
|
||
For deployment considerations, when the link-training is enabled on a transceiver without LT capabilities, this config command should report an error and get aborted, unless '-f' option is specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have removed pmon related sections, I believe this line becomes irrelevant. Can you please remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HLD updated
if autoneg changed: | ||
setPortAutoNeg(port, autoneg) | ||
|
||
if link_training changed and "LT" in xcvr_capabilities: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this relevant since pmon sections are removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HLD updated
admin@sonic:~$ show interfaces link-training status | ||
Interface LT Oper LT Admin Oper Admin Error Description | ||
----------- ----------- ---------- ------ ------- ----------------------- | ||
Ethernet0 trained on up up unsupported transceiver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe unsupported transceiver will not be relevant since we removed pmon related sections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HLD updated
Arguments: | ||
interface_name: [mandatory] name of the interface to be configured. e.g: Ethernet0 | ||
mode: [mandatory] link training mode, can be either "on" or "off" | ||
-f: [optional] forcing link-training configuration on an unsupported transceiver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the -f option since we don't perform transceiver checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HLD updated
Signed-off-by: Dante Su <dante.su@broadcom.com>
|
||
#### System Test cases | ||
|
||
Will leverage sonic-mgmt to test this feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you be more specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will there be new test cases added for this new feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment, and the HLD is now updated, please check it out and see if it works
Signed-off-by: Dante Su <dante.su@broadcom.com>
8498931
to
8837dc2
Compare
Signed-off-by: Dante Su <dante.su@broadcom.com>
Signed-off-by: Dante Su <dante.su@broadcom.com>
Will this comment be considered? #925 (comment) ? |
This is part of HLD sonic-net/SONiC#925 #### Why I did it Add link-training support #### How I did it Update SONiC YANG for port link-training support #### Description for the changelog Add "link_training" to sonic-port.yang #### Link to config_db schema for YANG module changes https://github.com/sonic-net/SONiC/wiki/Configuration#port
HLD: sonic-net/SONiC#925 #### What I did Add CLI support for link-trainig #### How I did it 1. portconfig: initial support for link-training config 2. config/main.py: initial support for link-training config 3. intfutil: initial support for link-training show command 4. show/interfaces/__init__.py: initial support for link-training show command #### How to verify it 1. Manual test 2. Ran the Unit-tests to the corresponding changes #### Previous command output (if the output of a command-line utility has changed) #### New command output (if the output of a command-line utility has changed) ``` admin@sonic:~$ sudo config interface link-training Ethernet0 on admin@sonic:~$ sudo config interface link-training Ethernet8 on admin@sonic:~$ sudo config interface link-training Ethernet16 off admin@sonic:~$ sudo config interface link-training Ethernet24 on admin@sonic:~$ sudo config interface link-training Ethernet32 off admin@sonic:~$ show interfaces link-training status Interface LT Oper LT Admin Oper Admin ----------- ----------- ---------- ------ ------- Ethernet0 trained on up up Ethernet8 trained on up up Ethernet16 off off down up Ethernet24 not trained on down up Ethernet32 off off down up Ethernet40 off - down up Ethernet48 off - down up Ethernet56 off - down up Ethernet64 off - down up Ethernet72 off - down up Ethernet80 off - down up Ethernet88 off - down up Ethernet96 off - down up Ethernet104 off - down up Ethernet112 off - down up Ethernet120 off - down up Ethernet128 off - down up Ethernet136 off - down up Ethernet144 off - down up Ethernet152 off - down up Ethernet160 off - down up Ethernet168 off - down up Ethernet176 off - down up Ethernet184 off - down up Ethernet192 off - down up Ethernet200 off - down up Ethernet208 off - down up Ethernet216 off - down up Ethernet224 off - down up Ethernet232 off - down up Ethernet240 off - down up Ethernet248 off - down up admin@sonic:~$ ```
HLD: sonic-net/SONiC#925 #### What I did Add CLI support for link-trainig #### How I did it 1. portconfig: initial support for link-training config 2. config/main.py: initial support for link-training config 3. intfutil: initial support for link-training show command 4. show/interfaces/__init__.py: initial support for link-training show command #### How to verify it 1. Manual test 2. Ran the Unit-tests to the corresponding changes #### Previous command output (if the output of a command-line utility has changed) #### New command output (if the output of a command-line utility has changed) ``` admin@sonic:~$ sudo config interface link-training Ethernet0 on admin@sonic:~$ sudo config interface link-training Ethernet8 on admin@sonic:~$ sudo config interface link-training Ethernet16 off admin@sonic:~$ sudo config interface link-training Ethernet24 on admin@sonic:~$ sudo config interface link-training Ethernet32 off admin@sonic:~$ show interfaces link-training status Interface LT Oper LT Admin Oper Admin ----------- ----------- ---------- ------ ------- Ethernet0 trained on up up Ethernet8 trained on up up Ethernet16 off off down up Ethernet24 not trained on down up Ethernet32 off off down up Ethernet40 off - down up Ethernet48 off - down up Ethernet56 off - down up Ethernet64 off - down up Ethernet72 off - down up Ethernet80 off - down up Ethernet88 off - down up Ethernet96 off - down up Ethernet104 off - down up Ethernet112 off - down up Ethernet120 off - down up Ethernet128 off - down up Ethernet136 off - down up Ethernet144 off - down up Ethernet152 off - down up Ethernet160 off - down up Ethernet168 off - down up Ethernet176 off - down up Ethernet184 off - down up Ethernet192 off - down up Ethernet200 off - down up Ethernet208 off - down up Ethernet216 off - down up Ethernet224 off - down up Ethernet232 off - down up Ethernet240 off - down up Ethernet248 off - down up admin@sonic:~$ ```
HLD: sonic-net/SONiC#925 #### What I did Add CLI support for link-trainig #### How I did it 1. portconfig: initial support for link-training config 2. config/main.py: initial support for link-training config 3. intfutil: initial support for link-training show command 4. show/interfaces/__init__.py: initial support for link-training show command #### How to verify it 1. Manual test 2. Ran the Unit-tests to the corresponding changes #### Previous command output (if the output of a command-line utility has changed) #### New command output (if the output of a command-line utility has changed) ``` admin@sonic:~$ sudo config interface link-training Ethernet0 on admin@sonic:~$ sudo config interface link-training Ethernet8 on admin@sonic:~$ sudo config interface link-training Ethernet16 off admin@sonic:~$ sudo config interface link-training Ethernet24 on admin@sonic:~$ sudo config interface link-training Ethernet32 off admin@sonic:~$ show interfaces link-training status Interface LT Oper LT Admin Oper Admin ----------- ----------- ---------- ------ ------- Ethernet0 trained on up up Ethernet8 trained on up up Ethernet16 off off down up Ethernet24 not trained on down up Ethernet32 off off down up Ethernet40 off - down up Ethernet48 off - down up Ethernet56 off - down up Ethernet64 off - down up Ethernet72 off - down up Ethernet80 off - down up Ethernet88 off - down up Ethernet96 off - down up Ethernet104 off - down up Ethernet112 off - down up Ethernet120 off - down up Ethernet128 off - down up Ethernet136 off - down up Ethernet144 off - down up Ethernet152 off - down up Ethernet160 off - down up Ethernet168 off - down up Ethernet176 off - down up Ethernet184 off - down up Ethernet192 off - down up Ethernet200 off - down up Ethernet208 off - down up Ethernet216 off - down up Ethernet224 off - down up Ethernet232 off - down up Ethernet240 off - down up Ethernet248 off - down up admin@sonic:~$ ```
Why I did this?
In the case of DAC, static pre-calibrated pre-emphasis is rarely available on SONIC, as most of the ODM are expecting this to be done dynamically at runtime via link-training, hence we'll need this feature to improve the link quality
Related PRs:
Signed-off-by: Dante Su dante.su@broadcom.com